Goto

Collaborating Authors

 linear prediction


Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Neural Information Processing Systems

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other lucky settings when FTL achieves sublinear, small regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL.


A lattice filter model of the visual pathway

Neural Information Processing Systems

Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina lateral geniculate nucelus (LGN) primary visual cortex, V1) we propose to model its function using lattice filters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice filter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive fields of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice filter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals.


On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

Neural Information Processing Systems

We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes. These bounds make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either L_2 or L_1 constraints), margin bounds (including both L_2 and L_1 margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and L_2 covering numbers (with L_p norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds (improving upon a number of previous results). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2).


Enhancing Deep Neural Network Training Efficiency and Performance through Linear Prediction

Ying, Hejie, Song, Mengmeng, Tang, Yaohong, Xiao, Shungen, Xiao, Zimin

arXiv.org Artificial Intelligence

Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method to optimize the training effectiveness of DNN, with the goal of improving model performance. Firstly, based on the observation that the DNN parameters change in certain laws during training process, the potential of parameter prediction for improving model training efficiency and performance is discovered. Secondly, considering the magnitude of DNN model parameters, hardware limitations and characteristics of Stochastic Gradient Descent (SGD) for noise tolerance, a Parameter Linear Prediction (PLP) method is exploit to perform DNN parameter prediction. Finally, validations are carried out on some representative backbones. Experiment results show that compare to the normal training ways, under the same training conditions and epochs, by employing proposed PLP method, the optimal model is able to obtain average about 1% accuracy improvement and 0.01 top-1/top-5 error reduction for Vgg16, Resnet18 and GoogLeNet based on CIFAR-100 dataset, which shown the effectiveness of the proposed method on different DNN structures, and validated its capacity in enhancing DNN training efficiency and performance.


Making Linear Predictions in PyTorch - MachineLearningMastery.com Making Linear Predictions in PyTorch - MachineLearningMastery.com

#artificialintelligence

Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […]


On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

Kakade, Sham M., Sridharan, Karthik, Tewari, Ambuj

Neural Information Processing Systems

We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes. These bounds make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either $L_2$ or $L_1$ constraints), margin bounds (including both $L_2$ and $L_1$ margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and $L_2$ covering numbers (with $L_p$ norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds (improving upon a number of previous results). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2). Papers published at the Neural Information Processing Systems Conference.


Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

Huang, Ruitong, Lattimore, Tor, György, András, Szepesvari, Csaba

Neural Information Processing Systems

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other "lucky" settings when FTL achieves sublinear, "small" regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL.


A lattice filter model of the visual pathway

Gregor, Karol, Chklovskii, Dmitri B.

Neural Information Processing Systems

Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Because the typical correlation time of natural stimuli, as well as the extent of temporal receptive fields of lateral geniculate nucleus (LGN) neurons, is much greater than neuronal time constants, such decorrelation must be done in stages combining contributions of multiple neurons. We propose to model temporal decorrelation in the visual pathway with the lattice filter, a signal processing device for stage-wise decorrelation of temporal signals. The stage-wise architecture of the lattice filter maps naturally onto the visual pathway (photoreceptors -> bipolar cells -> retinal ganglion cells -> LGN) and its filter weights can be learned using Hebbian rules in a stage-wise sequential manner. Moreover, predictions of neural activity from the lattice filter model are consistent with physiological measurements in LGN neurons and fruit fly second-order visual neurons. Therefore, the lattice filter model is a useful abstraction that may help unravel visual system function.


On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

Kakade, Sham M., Sridharan, Karthik, Tewari, Ambuj

Neural Information Processing Systems

We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes. These bounds make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either $L_2$ or $L_1$ constraints), margin bounds (including both $L_2$ and $L_1$ margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and $L_2$ covering numbers (with $L_p$ norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds (improving upon a number of previous results). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2).